Analyzing companies’ revenues and the labor workforce in Vietnam
Introduction
In this report, I attempt to give the audience the snapshot of how revenues of companies in 63 provinces of Vietnam and Vietnam labor workforce had changed during a period of 2009-2019.
Data source:
- All data used in the report (e.g., companies’ revenue, population) are retrieved from the official website of General Statistics Office (GSO) of Vietnam.
- The shape file (map data) comes from GADM through R package named raster, which, unfortunately, didn’t include the shape file of Hoang Sa and Truong Sa. Please kindly contact me via e-mail if you have a shape file that include those two islands.
If you are interested in how data are cleaned before visualization, please check out my github repo.
Because data from GSO are aggregated in province unit, I’ll first show you the map of Vietnam, which can help you locates where each province is. When you point at a certain province, you’ll see its name and other information, such as area and population.
source(here("echart-functions/map-ggplotly.r"))
ggplotly_population_vn()Company revenue
Overall
This density graph illustrates the distribution of total revenues of companies in all provinces of Vietnam over a period of 2015 to 2018. The higher the line, thhe more province under that area
- Ha Noi (the capital) and Ho Chi Minh city were the leading provinces, having the largest companies by revenue
- The total revenues of companies in most provinces are under 20000 Billions per year. Among 63 provinces, there are around 10 provinces whose companies generate revenues more than this point.
company_outlier <- company %>%
filter(year != 2010, !is.na(company_revenue),
tinh_thanh_pho != "Không xác định") %>%
arrange(company_revenue) %>% slice(1:3, (n()-10):n())
company %>%
filter(year != 2010, !is.na(company_revenue),
tinh_thanh_pho != "Không xác định") %>%
ggplot(aes(company_revenue, year)) +
ggridges::geom_density_ridges(alpha = 0.7)+
ggrepel::geom_text_repel(
position = position_nudge(),
aes(company_revenue, year, label = tinh_thanh_pho),
color = "red",
inherit.aes = FALSE,
data = company_outlier
) +
geom_point(
aes(company_revenue, year),
color = "red",
inherit.aes = FALSE,
data = company_outlier
) +
geom_vline(xintercept = 20000, linetype =3, color = "firebrick")+
scale_x_continuous(label = comma, breaks = c(seq(-20000, 200000, 40000))) +
labs(title = "Total company revenue of each province \n",
x = "Billions VND",
y = "")Next, we will take a look at provinces whose companies make the largest total revenues and the smallest ones.
The high revenue club
- Companies in Ha Noi slowly kept up with those in HCM city in terms of revenue generation.
- Despite small numbers of companies, companies located in Bac Ninh and Thai Nguyen gained huge amounts of revenue.
company %>%
filter(year != 2010, !is.na(company_revenue),
tinh_thanh_pho != "Không xác định") %>%
mutate(tinh_thanh_pho = fct_lump(tinh_thanh_pho,
w = abs(company_revenue),
n = 8, other_level = "Avg of the others")) %>%
group_by(tinh_thanh_pho) %>%
arrange(year) %>%
mutate(n_company = last(n_company)) %>%
ungroup() %>%
group_by(tinh_thanh_pho, year) %>%
summarise(avg_revenue = mean(company_revenue),
n_company_2018 = mean(n_company),
.groups = "drop") %>%
mutate(bn_tn = tinh_thanh_pho %in% c("Bắc Ninh", "Thái Nguyên")) %>%
mutate(tinh_thanh_pho = glue("{tinh_thanh_pho} ({comma(n_company_2018)})"),
tinh_thanh_pho = fct_reorder(tinh_thanh_pho, abs(avg_revenue))) %>%
ggplot(aes(avg_revenue, tinh_thanh_pho, fill = bn_tn)) +
geom_col(show.legend = FALSE) +
scale_x_continuous(label = comma) +
scale_fill_manual(values = c("#66B38BF6", "#A14442")) +
expand_limits(x = 200000) +
facet_wrap( ~ year, nrow = 2, scale = "free_x") +
labs(
title = "Top 10 total revenues of companies by province",
subtitle = "(n) is a number of companies in 2018 \n",
x = "Billions VND",
y = NULL
)+
theme_facetThe smallest total revenues club
- Companies in Ha Tinh generated the lowest total revenues in three consecutive years, from 2015 to 2017
- Notably, in 2018, the total revenues of companies in Thanh Hoa reduced sharply, which results in Thanh Hoa replacing Ha Tinh in the bottom
company %>%
filter(year != 2010, !is.na(company_revenue),
tinh_thanh_pho != "Không xác định") %>%
mutate(tinh_thanh_pho = fct_lump(tinh_thanh_pho,
w = 1/abs(company_revenue+1000000),
n = 8, other_level = "Avg of the others")) %>%
group_by(tinh_thanh_pho) %>%
arrange(year) %>%
mutate(n_company = last(n_company)) %>%
ungroup() %>%
group_by(tinh_thanh_pho, year) %>%
summarise(avg_revenue = mean(company_revenue),
n_company_2019 = mean(n_company),
.groups = "drop") %>%
mutate(tinh_thanh_pho = glue("{tinh_thanh_pho} ({comma(n_company_2019)})"),
tinh_thanh_pho = fct_reorder(tinh_thanh_pho, abs(avg_revenue))) %>%
ggplot(aes(avg_revenue, tinh_thanh_pho, fill = avg_revenue < 0)) +
geom_col(show.legend = FALSE) +
scale_x_continuous(label = comma) +
scale_fill_manual(values = c("#66B38BF6", "#A14442")) +
facet_wrap( ~ year, nrow = 2, scale = "free_x") +
labs(
title = "Top smallest total company revenues province",
subtitle = "(n) is the number of company in 2018 \n",
x = "Billions VND",
y = NULL
)+
theme_facetLabor workforce
Labor force over total population
In this chart, I using the employment-to-population ratio to compare between regions because it might be fairer to take the total province population into account.
# read data
labor <- read_rds("cleanded-data/labor.rds") %>%
filter(tinh_thanh_pho != "Bắc Trung Bộ và duyên hải miền Trung") %>%
rename(n_labor = value) %>%
mutate(clean_name = janitor::make_clean_names(tinh_thanh_pho),
clean_name = str_remove(clean_name, "_[1-9]"),
clean_name = str_remove(clean_name, "tp_"))
danso <- read_rds("cleanded-data/area-population.rds") %>%
rename(tinh_thanh_pho = dia_phuong,
pop_1000 = dan_so_trung_binh_nghin_nguoi,
pop_density = mat_do_dan_so_nguoi_km2) %>%
mutate(clean_name = janitor::make_clean_names(tinh_thanh_pho),
clean_name = str_remove(clean_name, "_[1-9]"))labor_over_pop <- labor %>%
inner_join(danso, by = c("year", "clean_name")) %>%
filter(year == 2018) %>%
mutate(labor_over_pop = n_labor / 1000 / pop_1000) %>%
filter(dense_rank(labor_over_pop) <= 10 |
dense_rank(desc(labor_over_pop)) <= 10) %>%
mutate(
tinh_thanh_pho.x = fct_reorder(tinh_thanh_pho.x, labor_over_pop),
high_low = as.numeric(tinh_thanh_pho.x) > 10
)
labor_over_pop %>%
ggplot(aes(labor_over_pop, tinh_thanh_pho.x, fill = high_low)) +
geom_col(show.legend = FALSE) +
geom_text(aes(x = labor_over_pop / 2, label = 100 * round(labor_over_pop, 3)),
color = "white") +
geom_vline(xintercept = mean(labor_over_pop$labor_over_pop),
linetype = 3,
color = "orange") +
geom_label(
label = glue("Average 10.8%"),
x = .13,
y = 3,
inherit.aes = FALSE
) +
scale_x_continuous(labels = percent, n.breaks = 10) +
scale_fill_manual(values = c("#66B38BF6", "#A14442")) +
labs(
title = "Employment to population ratio by province in 2018",
subtitle = "10 Highest and 10 lowest (% scale)",
x = "",
y = ""
) +
theme(axis.text.x = element_blank(), line = element_blank())The chart illustrates the percentages of employed people over the whole population of 20 provinces: 10 provinces with the highest ratios and 10 provinces with the lowest ratios. As shown in the chart, we can see that Binh Duong had the highest percentage of employment to the population, followed by HCM city and Ha Noi.
To be more clear, I present this indicator on the map. You can filter the province by interacting with the left side bar.
geojson_map <- read_rds("cleanded-data/geojson_vnmap.rds")
labor %>%
inner_join(danso, by = c("year", "clean_name")) %>%
mutate(labor_over_pop = 100 * n_labor / 1000 / pop_1000) %>%
filter(year == 2018) %>%
e_charts(clean_name) %>%
e_map_register("vn", geojson_map) %>%
e_map(labor_over_pop, map = "vn") %>%
e_visual_map(labor_over_pop) %>%
e_theme("infographic") %>%
e_title(text = "Employment to population ratio by province (in 2018)")As presented in the map, it can be seen that in the north, workers were concentrated in Ha Noi and adjacent provinces. In the South, workers were concentrated in Ho Chi Minh city and Binh Duong. The middle of VN seem quite sparse in term of labor thought the population wasn’t, only Da Nang are comparable to the other centrals.
Labor force by age group
For the last part, I visualized the labor force of Vietnam by the age group.
labor_by_age %>%
filter(nhom_tuoi != "total") %>%
mutate(age = as.numeric(str_sub(nhom_tuoi, 1, 2)) + 2.5) %>%
group_by(year) %>%
mutate(weight = n_labor / sum(n_labor) * age) %>%
summarise(avg_age = sum(weight)) %>%
ggplot(aes(year, avg_age))+
geom_point()+
geom_line()+
geom_text(aes(label = round(avg_age,1)), position = position_nudge(y= .2))+
scale_x_continuous(breaks = seq(2009,2019,1))+
labs(x= "", y= "",
title = "Avg age of labor force") After 10 year, the average age of labor force increase about 3 years old and nearly reach 40.
labor_by_age %>%
filter(nhom_tuoi != "total") %>%
group_by(year) %>%
summarise(pct = n_labor / sum(n_labor), nhom_tuoi) %>%
ggplot(aes(year, pct, fill = nhom_tuoi)) +
geom_col() +
geom_text(position = position_stack(vjust= 0.5) , aes(label = round(pct * 100)),color = "white") +
scale_y_continuous(label = percent) +
scale_x_continuous(breaks = seq(2009, 2019, 1)) +
scale_fill_brewer(palette = 5,type = "div")+
coord_flip()+
labs(x = "",
y = "",
fill = "",
title = "Percentage of labor force by age group")We could seen the beginning of the next wave: The percentage of labor in the intermediate group is kind of stable (around 13%) while the youngest group is decreasing it proportion. It mean that labor force composition is became older and Vn are slowly losing the advantage of golden age. This wave will take about 10 year to get affected. When the large part of population are the retired employee, pension payments would be the huge burden for the economy.